Skip to content

Translation Software

Alt text

How an interpreter differs from a compiler

  • An interpreter executes the program it is interpreting.
  • A compiler does not execute the program it is compiling.
  • With a compiler the program source code is input and either the object code program or error messages are output.
  • The object code produced can then be executed without needing recompilation.
  • With an interpreter, the program source code is similarly input; there may also be other inputs that the program requires or to correct errors in the source program.
  • No object code is output, but error messages from the interpreter are output, as well as any outputs produced by the program being interpreted.
  • As there is no object code produced from the interpretation process, the interpreter will need to be used every time the program is executed.

  • Both a compiler and an interpreter will construct a symbol table.
  • An interpreter will also allocate space in memory to store any constants, variables and other data items used by the program.
  • The interpreter checks each statement individually and reports any errors, which can be corrected before the statement is executed.
  • After each statement is executed, control is returned to the interpreter so the next statement can be checked before execution.

Alt text

Stages in the compilation of a program

  • The process of translating a source program written in a high-level language into an object program in machine code can be divided into four stages:
    • lexical analysis
    • syntax analysis
    • code generation
    • optimisation

Lexical analysis

  • Lexical analysis is the first stage in the process of compilation.

  • All unnecessary characters not required by the compiler, such as white space and comments, are removed.

  • Before translation, the source program needs to be converted to tokens.

  • This process is called tokenisation.

  • Source program

Alt text

// My addition program Version 2
DECLARE x, y, z : INTEGER
OUTPUT "Please enter two numbers to add together"
INPUT x, y
z=x+ y
OUTPUT "Answer is ", z
  • The output from the lexical analysis is a tokenised list stored in main memory.
  • Here is the output for the first four lines of the program:

31 81 04 82 04 83 03 34 84 33 81 04 82 83 01 81 02 82

  • Source program after unnecessary characters have been removed

Alt text

DECLARE x, y, z : INTEGER
OUTPUT "Please enter two numbers to add together"
INPUT x, y
z=x+ y
OUTPUT "Answer is ", z

Syntax analysis

  • Syntax analysis is the next stage in the process of compilation.

  • In this stage, output from the lexical analysis is checked for grammatical (syntax) errors.

  • For example, this source program statement:

    z + x + y

  • would produce this tokenised list:

    83 02 81 02 82

    error = (01) expected

  • As shown above, the complete tokenised list is checked for errors using the grammatical rules for the programming language. - This is called parsing. The whole program goes through this process even if errors are found. The rules for parsing can be set out in

Backus-Naur form (BNF) notation

  • If any errors are found, each statement and the associated error are output but, in the next stage of compilation, code generation will not be attempted.

Code generation

  • The code generation stage produces an object program to perform the task defined in the source code.
  • The program must be syntactically correct for an object program to be produced.
  • The object program is in machine-readable form (binary).
  • It is no longer in a form that is designed to be read by humans.
  • This object program is either in machine code that can be executed by the CPU, or in an intermediate form that is converted to machine code when the program is loaded.
  • The latter option allows greater flexibility.
  • For example, intermediate code can support:
    • the use of relocatable code so the program can be stored anywhere in main memory
    • the addition of library routines to the code at this stage to minimise the size of the stored object program
    • the linking of several programs to run together.

Optimisation

  • The optimisation stage supports the creation of an efficient object program.
  • Optimised programs should perform the task using the minimum amount of resources.
  • These include time, storage space, memory and CPU use.
  • Some optimisation can take place after the syntax analysis or as part of code generation.

A simple example of code optimisation is shown here

Alt text

Syntax diagrams and Backus-Naur form

  • The grammatical rules or syntax for a programming language need to be set out clearly so a programmer can write code that obeys the rules, and a compiler can be built to check that a program obeys these rules.
  • The rules can be shown graphically in a syntax diagram or using a meta language such as Backus-Naur form (BNF) notation, which is a formal method showing syntax as a list of replacement rules to represent the algorithm used in syntax analysis.

Syntax diagrams

Alt text

Backus-Naur form

  • BNF uses a set of symbols to describe the grammar rules in a programming language.
  • BNF notation includes:
< >     used to enclose an item
::=      separates an item from its definition
|         between items indicates a choice
;          the end of a rule

<variable> ::= <letter> | <digit> ;
<letter>   ::= A | B |C ;
<digit>    ::= 1 | 2 |3 ;

<variable> ::= <letter> | <variable> <letter> ;

Reverse Polish notation (RPN)

  • Reverse Polish notation (RPN) is a method of representing an arithmetical or logical expression without the use of brackets or special punctuation. RPN uses postfix notation, where an operator is placed after the variables it acts on.
  • For example, A + B would be written as A B +.
  • Compilers use RPN because any expression can be processed from left to right without using any back tracking.
  • Here, we will look at how RPN can be formed by using operator precedence (brackets, multiplication and division, addition and subtraction).

Alt text

  • To evaluate the expression using a stack
    • the values are added to the stack in turn going from left to right
    • when an operator is encountered, it is not added to the stack but used to operate on the top two values of the stack – which are popped off the stack, operated on, then the result is pushed back on the stack
    • this is repeated until there is a single value on the stack and the end of the expression has been reached.

TIP

  • If A = 2, B = 3 and C = 4, then A B C * - is evaluated using a stack

Alt text